term and definition
Complex Mathematical Symbol Definition Structures: A Dataset and Model for Coordination Resolution in Definition Extraction
Martin-Boyle, Anna, Head, Andrew, Lo, Kyle, Sidhu, Risham, Hearst, Marti A., Kang, Dongyeop
Mathematical symbol definition extraction is important for improving scholarly reading interfaces and scholarly information extraction (IE). However, the task poses several challenges: math symbols are difficult to process as they are not composed of natural language morphemes; and scholarly papers often contain sentences that require resolving complex coordinate structures. We present SymDef, an English language dataset of 5,927 sentences from full-text scientific papers where each sentence is annotated with all mathematical symbols linked with their corresponding definitions. This dataset focuses specifically on complex coordination structures such as "respectively" constructions, which often contain overlapping definition spans. We also introduce a new definition extraction method that masks mathematical symbols, creates a copy of each sentence for each symbol, specifies a target symbol, and predicts its corresponding definition spans using slot filling. Our experiments show that our definition extraction model significantly outperforms RoBERTa and other strong IE baseline systems by 10.9 points with a macro F1 score of 84.82. With our dataset and model, we can detect complex definitions in scholarly documents to make scientific writing more readable.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
- North America > United States > Washington > King County > Seattle (0.14)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
- (20 more...)
Unifying data and AI terms for all - ITU Hub
The world is witnessing rapid technological advances in the fields of data science and artificial intelligence (AI). From helping fight climate change to addressing all the other sustainable development goals of the United Nations, valuable use cases show how cutting-edge data and AI applications can improve our daily lives. At the same time, public awareness initiatives are still behind the curve, leaving many people feeling ambivalent about AI. Moreover, for non-technical readers, disparate definitions of data and AI terms can impede easy understanding of these dynamic fields. Despite global summits, educational publications, and ample media coverage, the fields of AI and data science stand to benefit from an agreed set of accessible definitions and terminologies.
A Joint Model for Definition Extraction with Syntactic Connection and Semantic Consistency
Veyseh, Amir Pouran Ben, Dernoncourt, Franck, Dou, Dejing, Nguyen, Thien Huu
Definition Extraction (DE) is one of the well-known topics in Information Extraction that aims to identify terms and thei r corresponding definitions in unstructured texts. This task can be formalized either as a sentence classification task (i.e., containing term-definition pairs or not) or a sequential labeling task (i.e., identifying the boundaries of the terms a nd definitions). The previous works for DE have only focused on one of the two approaches, failing to model the interdependencies between the two tasks. In this work, we propose a novel model for DE that simultaneously performs the two tasks in a single framework to benefit from their interdependencies. Our model features deep learning architectu res to exploit the global structures of the input sentences as we ll as the semantic consistencies between the terms and the definitions, thereby improving the quality of the representat ion vectors for DE. Besides the joint inference between sentenc e classification and sequential labeling, the proposed model is fundamentally different from the prior work for DE in that th e prior work has only employed the local structures of the input sentences (i.e., word-to-word relations), and not yet c on-sidered the semantic consistencies between terms and definitions. In order to implement these novel ideas, our model presents a multi-task learning framework that employs grap h convolutional neural networks and predicts the dependency paths between the terms and the definitions. We also seek to enforce the consistency between the representations of t he terms and definitions both globally (i.e., increasing seman - tic consistency between the representations of the entire s en-tences and the terms/definitions) and locally (i.e., promot ing the similarity between the representations of the terms and the definitions). The extensive experiments on three benchmark datasets demonstrate the effectiveness of our approach.
- North America > United States > Oregon (0.04)
- North America > United States > California > Santa Clara County > San Jose (0.04)
- Information Technology > Security & Privacy (1.00)
- Media > News (0.69)